Load the data dataCar from the package “insuranceData”.
It represents claim data on vehicle insurance policies from 2004 to
2005. Some variables like “gender” describe the policy holder, others
like “veh_age” the vehicle, and some variables carry information on
claims, e.g. “numclaims”. Each row represents policy information valid
in a certain time window. Use the pipe, “dplyr”, and “ggplot2” to solve
the following tasks.
library(tidyverse)
library(insuranceData)
library(plotly)
data(dataCar)
str(dataCar)
## 'data.frame': 67856 obs. of 11 variables:
## $ veh_value: num 1.06 1.03 3.26 4.14 0.72 2.01 1.6 1.47 0.52 0.38 ...
## $ exposure : num 0.304 0.649 0.569 0.318 0.649 ...
## $ clm : int 0 0 0 0 0 0 0 0 0 0 ...
## $ numclaims: int 0 0 0 0 0 0 0 0 0 0 ...
## $ claimcst0: num 0 0 0 0 0 0 0 0 0 0 ...
## $ veh_body : Factor w/ 13 levels "BUS","CONVT",..: 4 4 13 11 4 5 8 4 4 4 ...
## $ veh_age : int 3 2 2 2 4 3 3 2 4 4 ...
## $ gender : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 2 1 1 ...
## $ area : Factor w/ 6 levels "A","B","C","D",..: 3 1 5 4 3 3 1 2 1 2 ...
## $ agecat : int 2 4 2 2 2 4 4 6 3 4 ...
## $ X_OBSTAT_: Factor w/ 1 level "01101 0 0 0": 1 1 1 1 1 1 1 1 1 1 ...
head(dataCar)
Draw barplots of the discrete variables “numclaims”, “agecat” (categorized driver age), and “gender”.
dataCar %>% ggplot(mapping = aes(x = numclaims)) +
geom_bar(fill = "navyblue")
dataCar %>% ggplot(mapping = aes(x = agecat)) +
geom_bar(fill = "navyblue")
dataCar %>% ggplot(mapping = aes(x = gender)) +
geom_bar(fill = "navyblue")
Draw a histogram of the vehicle value “veh_value” (in 10’000 Australian Dollars). Truncate values above 7 (this means: if a value is larger than 7, set it to 7).
dataCar %>% mutate(veh_value = (veh_value > 7)*7 + veh_value*(veh_value <= 7)) %>%
arrange(-veh_value) %>%
ggplot(mapping = aes(veh_value)) +
geom_histogram(fill = "navyblue")
### c)
Calculate the average number of claims per level of “agecat” and visualize the result as a scatterplot. Interpret the result.
dataCar %>%
group_by(agecat) %>%
summarize(avg_claims = mean(numclaims)) %>%
ggplot(mapping = aes(x = agecat, y = avg_claims)) +
geom_point(fill = "navy")
The older the owner the smaller the average claim gets. That makes sense, since younger driver may drive more reckless than older people.
Bin “veh_value” into quartiles and analyze its association with the number of claims as in 1c.
summary(dataCar$veh_value)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.010 1.500 1.777 2.150 34.560
plot <- dataCar %>% mutate(veh_value_bin = ntile(veh_value, n=4)) %>%
group_by(veh_value_bin) %>%
summarize(avg_claims = mean(numclaims)) %>%
ggplot(mapping = aes(x = veh_value_bin, y = avg_claims)) +
geom_point(fill = "navy")
plot
The higher the price of the car, the higher are the average claims.
Use the “plotly” package to turn the plot from d. interactive.
(plot) %>% ggplotly()
The sieve of Eratosthenes is an ancient algorithm to get all prime
numbers up to any given limit n, see Wikipedia.
Write a function sieve_of_eratosthenes(n) that returns all
prime numbers up to n. Benchmark the
results for n = 10^5 with the package
“microbenchmark”. Mind your coding style!
sieve_of_eratostheses <- function(n){
sieve = !logical(n)
i = 2
while(i <= sqrt(n)){
if(sieve[i]){
j = i^2
while(j<=n){
sieve[j]=FALSE
j = j+i
}
}
i = i+1
}
out = which(sieve %in% TRUE)
out = out[out!=1]
return(out)
}
library(microbenchmark)
res <- microbenchmark(sieve_of_eratostheses(10^5), times=100)
print(res)
## Unit: milliseconds
## expr min lq mean median uq max
## sieve_of_eratostheses(10^5) 12.0717 17.20875 18.30951 18.05325 19.3729 28.5984
## neval
## 100
ggplot2::autoplot(res)
In Exercise 1c, we have calculated and plotted the average number of
claims per level of “agecat” in the dataCar data. a. Write
a function avg_claim_counts(v) that provides such a
visualization for any discrete variable v. b. Extend this
function with a second argument interactive to control
whether the resulting plot is interactive or not.
avg_claim_counts <- function(v, interactive=FALSE){
plot <- dataCar %>%
group_by(across(all_of(v))) %>%
summarize(avg_claims = mean(numclaims)) %>%
ggplot(mapping = aes(x = .data[[v]], y = avg_claims)) +
geom_point(fill = "navy")
if(interactive){
plot <- (plot) %>% ggplotly()
}
return(plot)
}
Extend the “student” class from Section “plot, print, summary” by the optional information “semester”. It represents the number of semesters the student is already registered. Add a summary() method that would neatly print the name and the semester of the student.
student <- function(given_name, family_name, semester = NULL) {
out <- list(
given_name = given_name,
family_name = family_name,
semester = semester
)
class(out) <- "student"
out
}
summary.student <- function(object){
cat("Name: ", object$given_name, " ", object$family_name, "\n")
cat("Semester: ", object$semester, "\n")
}
#other option to set a Method for a class
# setMethod("summary", "student", function(object) {
# cat("Name: ", object$given_name, " ", object$family_name, "\n")
# cat("Semester: ", object$semester, "\n")
# })
me <- student("Tobias", "Hugentobler", 2)
summary(me)
## Name: Tobias Hugentobler
## Semester: 2